29 research outputs found

    Topological Features In Cancer Gene Expression Data

    Full text link
    We present a new method for exploring cancer gene expression data based on tools from algebraic topology. Our method selects a small relevant subset from tens of thousands of genes while simultaneously identifying nontrivial higher order topological features, i.e., holes, in the data. We first circumvent the problem of high dimensionality by dualizing the data, i.e., by studying genes as points in the sample space. Then we select a small subset of the genes as landmarks to construct topological structures that capture persistent, i.e., topologically significant, features of the data set in its first homology group. Furthermore, we demonstrate that many members of these loops have been implicated for cancer biogenesis in scientific literature. We illustrate our method on five different data sets belonging to brain, breast, leukemia, and ovarian cancers.Comment: 12 pages, 9 figures, appears in proceedings of Pacific Symposium on Biocomputing 201

    Comparative genomics reveals multiple pathways to mutualism for tick-borne pathogens

    Get PDF
    Accelerated pipeline for DNA and amino acid sequences clustering

    Identification of Anaplasma marginale Type IV Secretion System Effector Proteins

    Get PDF
    Anaplasma marginale, an obligate intracellular alphaproteobacterium in the order Rickettsiales, is a tick-borne pathogen and the leading cause of anaplasmosis in cattle worldwide. Complete genome sequencing of A. marginale revealed that it has a type IV secretion system (T4SS). The T4SS is one of seven known types of secretion systems utilized by bacteria, with the type III and IV secretion systems particularly prevalent among pathogenic Gram-negative bacteria. The T4SS is predicted to play an important role in the invasion and pathogenesis of A. marginale by translocating effector proteins across its membrane into eukaryotic target cells. However, T4SS effector proteins have not been identified and tested in the laboratory until now.Published copyLockwood, S., D. E. Voth, K. A. Brayton, P. A. Beare, W. C. Brown, R. A. Heinzen, and S. L. Broschat, Identification of Anaplasma marginale type IV secretion system effector proteins, PLoS ONE, Vol. 6, No. 11, e7724, Nov. 2011. DOI: 10.1371/journal.pone.0027724

    Whole Proteome Clustering of 2,307 Proteobacterial Genomes Reveals Conserved Proteins and Significant Annotation Issues

    Get PDF
    We clustered 8.76 M protein sequences deduced from 2,307 completely sequenced Proteobacterial genomes resulting in 707,311 clusters of one or more sequences of which 224,442 ranged in size from 2 to 2,894 sequences. To our knowledge this is the first study of this scale. We were surprised to find that no single cluster contained a representative sequence from all the organisms in the study. Given the minimal genome concept, we expected to find a shared set of proteins. To determine why the clusters did not have universal representation we chose four essential proteins, the chaperonin GroEL, DNA dependent RNA polymerase subunits beta and beta′ (RpoB/RpoB′), and DNA polymerase I (PolA), representing fundamental cellular functions, and examined their cluster distribution. We found these proteins to be remarkably conserved with certain caveats. Although the groEL gene was universally conserved in all the organisms in the study, the protein was not represented in all the deduced proteomes. The genes for RpoB and RpoB′ were missing from two genomes and merged in 88, and the sequences were sufficiently divergent that they formed separate clusters for 18 RpoB proteins (seven clusters) and 14 RpoB′ proteins (three clusters). For PolA, 52 organisms lacked an identifiable sequence, and seven sequences were sufficiently divergent that they formed five separate clusters. Interestingly, organisms lacking an identifiable PolA and those with divergent RpoB/RpoB′ were predominantly endosymbionts. Furthermore, we present a range of examples of annotation issues that caused the deduced proteins to be incorrectly represented in the proteome. These annotation issues made our task of determining protein conservation more difficult than expected and also represent a significant obstacle for high-throughput analyses

    Evidence of superficial knowledge regarding antibiotics and their use: Results of two cross-sectional surveys in an urban informal settlement in Kenya

    Get PDF
    <div><p>We assessed knowledge and practices related to antibiotic use in Kibera, an urban informal settlement in Kenya. Surveys was employed at the beginning (entry) and again at the end (exit) of a 5-month longitudinal study of AMR. Two-hundred households were interviewed at entry, of which 149 were also interviewed at exit. The majority (>65%) of respondents in both surveys could name at least one antibiotic, with amoxicillin and cotrimoxazole jointly accounting for 85% and 77% of antibiotics mentioned during entry and exit, respectively. More than 80% of respondents felt antibiotics should not be shared or discontinued following the alleviation of symptoms. Nevertheless, 66% and 74% of respondents considered antibiotics effective for treating colds and flu in the entry and exit surveys, respectively. There was a high (87%, entry; 70% exit) level of reported antibiotic use (past 12 months) mainly for colds/flu, coughs and fever, with >80% of respondents obtaining antibiotics from health facilities and pharmacies. Less than half of respondents remembered getting information on the correct use of antibiotics, although 100% of those who did reported improved attitudes towards antibiotic use. Clinicians and community pharmacists were highly trusted information sources. Paired household responses (n = 149) generally showed improved knowledge and attitudes by the exit survey although practices were largely unchanged. Weak agreement (κ = -0.003 to 0.22) between survey responses suggest both that unintended learning had not occurred, and that participant responses were not based on established knowledge or behaviors. Targeted public education regarding antibiotics is needed to address this gap.</p></div

    Applications and Extensions of pClust to Big Microbial Proteomic Data

    No full text
    The goal of biological sciences is to understand the biomolecular mechanics of living organisms. Proteins serve as the foundation for organisms functional analysis and sequence analysis has shown to be invaluable in answering questions about individual organisms. The first step in any sequence analysis is alignment and it is common that even modestly sized studies involve hundreds of thousands of protein sequences.In multigenome studies, the time consideration for sequence alignment becomes paramount and heuristic algorithms are frequently used sacrificing accuracy for speedup. At the same time, new algorithms have appeared that provide not only highly efficient performance, but also guarantee to deliver optimal solutions. However, the adoption of these algorithms is hindered by the absence of generalized analysis pipeline as well as availability of user-friendly computational tools. In this dissertation we present applications of existing, computationally efficient algorithms to multigenome studies where we apply our developed pClust pipelineto various sets of microbial organisms. The computational time is significantly improved and the results are more accurate than those obtained by traditional methods.The first study is a baseline comparison study on a small set of 11 microorganisms. It compares pClust results to the existing scientific knowledge and finds it to be consistent while at the same time providing new insights.The second study addresses the question of identification of common tick-transmissiblity mechanisms across different species. It involves a larger set of 108 microbial genomes with approximately 127K protein sequences. Traditionally, a study of such scope would have required days or at least hours of CPU time of high-performance computers to produce all-versus-all sequence alignment. Using pClust it took less than 10 minutes on a desktop computer to perform sequence alignment and clustering. For this study we also developed a graphical user interface for pClust in order to make the new algorithms more accessible for use by microbiologists.The third study analyzes the set of all proteobacterial genomes. The study comprised of 2326 complete genomes containing 8.7M protein sequences. The alignment was performed using pGraph-Tascel algorithm on high-performance computers. This is the first study of its kind
    corecore